Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 400 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 43.9 KiB |
| Average record size in memory | 112.3 B |
Variable types
| Numeric | 13 |
|---|---|
| Categorical | 1 |
CRIM is highly correlated with ZN and 8 other fields | High correlation |
ZN is highly correlated with CRIM and 4 other fields | High correlation |
INDUS is highly correlated with CRIM and 7 other fields | High correlation |
NOX is highly correlated with CRIM and 8 other fields | High correlation |
RM is highly correlated with LSTAT and 1 other fields | High correlation |
AGE is highly correlated with CRIM and 7 other fields | High correlation |
DIS is highly correlated with CRIM and 7 other fields | High correlation |
RAD is highly correlated with CRIM and 3 other fields | High correlation |
TAX is highly correlated with CRIM and 7 other fields | High correlation |
PTRATIO is highly correlated with MEDV | High correlation |
LSTAT is highly correlated with CRIM and 7 other fields | High correlation |
MEDV is highly correlated with CRIM and 7 other fields | High correlation |
CRIM is highly correlated with RAD and 1 other fields | High correlation |
ZN is highly correlated with INDUS and 2 other fields | High correlation |
INDUS is highly correlated with ZN and 7 other fields | High correlation |
NOX is highly correlated with INDUS and 5 other fields | High correlation |
RM is highly correlated with LSTAT and 1 other fields | High correlation |
AGE is highly correlated with ZN and 4 other fields | High correlation |
DIS is highly correlated with ZN and 5 other fields | High correlation |
RAD is highly correlated with CRIM and 4 other fields | High correlation |
TAX is highly correlated with CRIM and 5 other fields | High correlation |
LSTAT is highly correlated with INDUS and 7 other fields | High correlation |
MEDV is highly correlated with INDUS and 2 other fields | High correlation |
CRIM is highly correlated with INDUS and 5 other fields | High correlation |
ZN is highly correlated with INDUS | High correlation |
INDUS is highly correlated with CRIM and 3 other fields | High correlation |
NOX is highly correlated with CRIM and 3 other fields | High correlation |
RM is highly correlated with MEDV | High correlation |
AGE is highly correlated with CRIM and 2 other fields | High correlation |
DIS is highly correlated with CRIM and 3 other fields | High correlation |
RAD is highly correlated with CRIM and 1 other fields | High correlation |
TAX is highly correlated with CRIM and 1 other fields | High correlation |
LSTAT is highly correlated with MEDV | High correlation |
MEDV is highly correlated with RM and 1 other fields | High correlation |
CRIM is highly correlated with INDUS and 2 other fields | High correlation |
ZN is highly correlated with INDUS and 7 other fields | High correlation |
INDUS is highly correlated with CRIM and 8 other fields | High correlation |
NOX is highly correlated with CRIM and 9 other fields | High correlation |
RM is highly correlated with PTRATIO and 2 other fields | High correlation |
AGE is highly correlated with ZN and 7 other fields | High correlation |
DIS is highly correlated with ZN and 8 other fields | High correlation |
RAD is highly correlated with ZN and 8 other fields | High correlation |
TAX is highly correlated with ZN and 6 other fields | High correlation |
PTRATIO is highly correlated with ZN and 8 other fields | High correlation |
B is highly correlated with CRIM and 1 other fields | High correlation |
LSTAT is highly correlated with NOX and 6 other fields | High correlation |
MEDV is highly correlated with ZN and 9 other fields | High correlation |
ZN has 296 (74.0%) zeros | Zeros |
Reproduction
| Analysis started | 2021-12-08 11:27:41.854202 |
|---|---|
| Analysis finished | 2021-12-08 11:28:17.504570 |
| Duration | 35.65 seconds |
| Software version | pandas-profiling v3.1.1 |
| Download configuration | config.json |
| Distinct | 398 |
|---|---|
| Distinct (%) | 99.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.757190925 |
| Minimum | 0.00906 |
|---|---|
| Maximum | 88.9762 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 0.00906 |
|---|---|
| 5-th percentile | 0.028694 |
| Q1 | 0.07782 |
| median | 0.24217 |
| Q3 | 3.5434275 |
| 95-th percentile | 16.89612 |
| Maximum | 88.9762 |
| Range | 88.96714 |
| Interquartile range (IQR) | 3.4656075 |
Descriptive statistics
| Standard deviation | 9.155495507 |
|---|---|
| Coefficient of variation (CV) | 2.436792723 |
| Kurtosis | 35.16111791 |
| Mean | 3.757190925 |
| Median Absolute Deviation (MAD) | 0.207705 |
| Skewness | 5.159668357 |
| Sum | 1502.87637 |
| Variance | 83.82309797 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 14.3337 | 2 | 0.5% |
| 0.01501 | 2 | 0.5% |
| 0.08265 | 1 | 0.2% |
| 0.21977 | 1 | 0.2% |
| 0.06664 | 1 | 0.2% |
| 0.02498 | 1 | 0.2% |
| 0.05515 | 1 | 0.2% |
| 0.11027 | 1 | 0.2% |
| 22.5971 | 1 | 0.2% |
| 0.28955 | 1 | 0.2% |
| Other values (388) | 388 |
| Value | Count | Frequency (%) |
| 0.00906 | 1 | |
| 0.01096 | 1 | |
| 0.01301 | 1 | |
| 0.01311 | 1 | |
| 0.01381 | 1 | |
| 0.01439 | 1 | |
| 0.01501 | 2 | |
| 0.01538 | 1 | |
| 0.01709 | 1 | |
| 0.01778 | 1 |
| Value | Count | Frequency (%) |
| 88.9762 | 1 | |
| 73.5341 | 1 | |
| 67.9208 | 1 | |
| 51.1358 | 1 | |
| 41.5292 | 1 | |
| 38.3518 | 1 | |
| 37.6619 | 1 | |
| 28.6558 | 1 | |
| 25.9406 | 1 | |
| 25.0461 | 1 |
| Distinct | 23 |
|---|---|
| Distinct (%) | 5.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.97 |
| Minimum | 0 |
|---|---|
| Maximum | 95 |
| Zeros | 296 |
| Zeros (%) | 74.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 12.5 |
| 95-th percentile | 80 |
| Maximum | 95 |
| Range | 95 |
| Interquartile range (IQR) | 12.5 |
Descriptive statistics
| Standard deviation | 22.79626118 |
|---|---|
| Coefficient of variation (CV) | 2.078054802 |
| Kurtosis | 4.344427362 |
| Mean | 10.97 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.280664508 |
| Sum | 4388 |
| Variance | 519.6695238 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=23)
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 20 | 19 | 4.8% |
| 80 | 12 | 3.0% |
| 25 | 9 | 2.2% |
| 22 | 7 | 1.8% |
| 12.5 | 7 | 1.8% |
| 40 | 5 | 1.2% |
| 90 | 5 | 1.2% |
| 30 | 4 | 1.0% |
| 45 | 4 | 1.0% |
| Other values (13) | 32 | 8.0% |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 12.5 | 7 | 1.8% |
| 17.5 | 1 | 0.2% |
| 20 | 19 | 4.8% |
| 21 | 3 | 0.8% |
| 22 | 7 | 1.8% |
| 25 | 9 | 2.2% |
| 28 | 2 | 0.5% |
| 30 | 4 | 1.0% |
| 33 | 4 | 1.0% |
| Value | Count | Frequency (%) |
| 95 | 3 | 0.8% |
| 90 | 5 | |
| 85 | 2 | 0.5% |
| 80 | 12 | |
| 75 | 1 | 0.2% |
| 70 | 3 | 0.8% |
| 60 | 3 | 0.8% |
| 55 | 3 | 0.8% |
| 52.5 | 2 | 0.5% |
| 45 | 4 | 1.0% |
| Distinct | 72 |
|---|---|
| Distinct (%) | 18.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.936425 |
| Minimum | 0.46 |
|---|---|
| Maximum | 27.74 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 0.46 |
|---|---|
| 5-th percentile | 2.172 |
| Q1 | 5.13 |
| median | 8.56 |
| Q3 | 18.1 |
| 95-th percentile | 19.58 |
| Maximum | 27.74 |
| Range | 27.28 |
| Interquartile range (IQR) | 12.97 |
Descriptive statistics
| Standard deviation | 6.848042068 |
|---|---|
| Coefficient of variation (CV) | 0.6261682468 |
| Kurtosis | -1.211903009 |
| Mean | 10.936425 |
| Median Absolute Deviation (MAD) | 5.33 |
| Skewness | 0.3233090448 |
| Sum | 4374.57 |
| Variance | 46.89568017 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 18.1 | 103 | |
| 19.58 | 23 | 5.8% |
| 8.14 | 18 | 4.5% |
| 6.2 | 15 | 3.8% |
| 3.97 | 11 | 2.8% |
| 10.59 | 11 | 2.8% |
| 21.89 | 10 | 2.5% |
| 9.9 | 9 | 2.2% |
| 6.91 | 9 | 2.2% |
| 5.19 | 8 | 2.0% |
| Other values (62) | 183 |
| Value | Count | Frequency (%) |
| 0.46 | 1 | 0.2% |
| 0.74 | 1 | 0.2% |
| 1.21 | 1 | 0.2% |
| 1.22 | 1 | 0.2% |
| 1.25 | 2 | |
| 1.38 | 1 | 0.2% |
| 1.47 | 2 | |
| 1.52 | 3 | |
| 1.69 | 2 | |
| 1.76 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 27.74 | 4 | 1.0% |
| 25.65 | 5 | 1.2% |
| 21.89 | 10 | 2.5% |
| 19.58 | 23 | 5.8% |
| 18.1 | 103 | |
| 15.04 | 2 | 0.5% |
| 13.92 | 4 | 1.0% |
| 13.89 | 4 | 1.0% |
| 12.83 | 4 | 1.0% |
| 11.93 | 5 | 1.2% |
CHAS
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.2 KiB |
| 0 | |
|---|---|
| 1 | 29 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 400 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 29 | 7.2% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 29 | 7.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 29 | 7.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 400 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 29 | 7.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 400 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 29 | 7.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 400 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 29 | 7.2% |
| Distinct | 80 |
|---|---|
| Distinct (%) | 20.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5528165 |
| Minimum | 0.385 |
|---|---|
| Maximum | 0.871 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 0.385 |
|---|---|
| 5-th percentile | 0.409 |
| Q1 | 0.449 |
| median | 0.532 |
| Q3 | 0.624 |
| 95-th percentile | 0.74 |
| Maximum | 0.871 |
| Range | 0.486 |
| Interquartile range (IQR) | 0.175 |
Descriptive statistics
| Standard deviation | 0.1154880329 |
|---|---|
| Coefficient of variation (CV) | 0.2089084405 |
| Kurtosis | 0.006077868001 |
| Mean | 0.5528165 |
| Median Absolute Deviation (MAD) | 0.084 |
| Skewness | 0.7618370193 |
| Sum | 221.1266 |
| Variance | 0.01333748574 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.538 | 18 | 4.5% |
| 0.489 | 15 | 3.8% |
| 0.713 | 14 | 3.5% |
| 0.871 | 13 | 3.2% |
| 0.74 | 12 | 3.0% |
| 0.437 | 12 | 3.0% |
| 0.693 | 11 | 2.8% |
| 0.605 | 10 | 2.5% |
| 0.647 | 10 | 2.5% |
| 0.624 | 10 | 2.5% |
| Other values (70) | 275 |
| Value | Count | Frequency (%) |
| 0.385 | 1 | 0.2% |
| 0.389 | 1 | 0.2% |
| 0.392 | 1 | 0.2% |
| 0.394 | 1 | 0.2% |
| 0.398 | 2 | |
| 0.4 | 4 | |
| 0.401 | 2 | |
| 0.403 | 3 | |
| 0.404 | 2 | |
| 0.405 | 2 |
| Value | Count | Frequency (%) |
| 0.871 | 13 | |
| 0.77 | 3 | 0.8% |
| 0.74 | 12 | |
| 0.718 | 4 | 1.0% |
| 0.713 | 14 | |
| 0.7 | 9 | |
| 0.693 | 11 | |
| 0.679 | 6 | |
| 0.671 | 6 | |
| 0.668 | 3 | 0.8% |
| Distinct | 362 |
|---|---|
| Distinct (%) | 90.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.292165 |
| Minimum | 4.138 |
|---|---|
| Maximum | 8.78 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 4.138 |
|---|---|
| 5-th percentile | 5.304 |
| Q1 | 5.8775 |
| median | 6.2085 |
| Q3 | 6.6205 |
| 95-th percentile | 7.6947 |
| Maximum | 8.78 |
| Range | 4.642 |
| Interquartile range (IQR) | 0.743 |
Descriptive statistics
| Standard deviation | 0.7099234861 |
|---|---|
| Coefficient of variation (CV) | 0.1128265845 |
| Kurtosis | 1.581516657 |
| Mean | 6.292165 |
| Median Absolute Deviation (MAD) | 0.3535 |
| Skewness | 0.6512418176 |
| Sum | 2516.866 |
| Variance | 0.5039913562 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6.405 | 3 | 0.8% |
| 6.417 | 3 | 0.8% |
| 6.167 | 3 | 0.8% |
| 6.376 | 2 | 0.5% |
| 6.38 | 2 | 0.5% |
| 6.209 | 2 | 0.5% |
| 6.63 | 2 | 0.5% |
| 6.009 | 2 | 0.5% |
| 6.185 | 2 | 0.5% |
| 6.127 | 2 | 0.5% |
| Other values (352) | 377 |
| Value | Count | Frequency (%) |
| 4.138 | 2 | |
| 4.368 | 1 | |
| 4.628 | 1 | |
| 4.652 | 1 | |
| 4.88 | 1 | |
| 4.903 | 1 | |
| 4.906 | 1 | |
| 4.926 | 1 | |
| 4.963 | 1 | |
| 4.97 | 1 |
| Value | Count | Frequency (%) |
| 8.78 | 1 | |
| 8.725 | 1 | |
| 8.704 | 1 | |
| 8.398 | 1 | |
| 8.375 | 1 | |
| 8.337 | 1 | |
| 8.297 | 1 | |
| 8.266 | 1 | |
| 8.259 | 1 | |
| 8.247 | 1 |
| Distinct | 296 |
|---|---|
| Distinct (%) | 74.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 68.086 |
| Minimum | 2.9 |
|---|---|
| Maximum | 100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 2.9 |
|---|---|
| 5-th percentile | 18.37 |
| Q1 | 42.375 |
| median | 76.95 |
| Q3 | 93.825 |
| 95-th percentile | 100 |
| Maximum | 100 |
| Range | 97.1 |
| Interquartile range (IQR) | 51.45 |
Descriptive statistics
| Standard deviation | 28.38688769 |
|---|---|
| Coefficient of variation (CV) | 0.4169269407 |
| Kurtosis | -1.012964048 |
| Mean | 68.086 |
| Median Absolute Deviation (MAD) | 20.35 |
| Skewness | -0.5732836116 |
| Sum | 27234.4 |
| Variance | 805.8153925 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 33 | 8.2% |
| 97.9 | 4 | 1.0% |
| 95.4 | 4 | 1.0% |
| 98.2 | 4 | 1.0% |
| 95.6 | 3 | 0.8% |
| 98.9 | 3 | 0.8% |
| 36.6 | 3 | 0.8% |
| 21.4 | 3 | 0.8% |
| 32.2 | 3 | 0.8% |
| 98.8 | 3 | 0.8% |
| Other values (286) | 337 |
| Value | Count | Frequency (%) |
| 2.9 | 1 | |
| 6 | 1 | |
| 6.2 | 1 | |
| 6.5 | 1 | |
| 6.6 | 2 | |
| 7.8 | 2 | |
| 8.4 | 1 | |
| 8.9 | 1 | |
| 9.8 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 100 | 33 | |
| 99.3 | 1 | 0.2% |
| 99.1 | 1 | 0.2% |
| 98.9 | 3 | 0.8% |
| 98.8 | 3 | 0.8% |
| 98.7 | 1 | 0.2% |
| 98.5 | 1 | 0.2% |
| 98.4 | 2 | 0.5% |
| 98.3 | 2 | 0.5% |
| 98.2 | 4 | 1.0% |
| Distinct | 339 |
|---|---|
| Distinct (%) | 84.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.81946175 |
| Minimum | 1.1296 |
|---|---|
| Maximum | 12.1265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 1.1296 |
|---|---|
| 5-th percentile | 1.439495 |
| Q1 | 2.10915 |
| median | 3.2721 |
| Q3 | 5.2146 |
| 95-th percentile | 7.9549 |
| Maximum | 12.1265 |
| Range | 10.9969 |
| Interquartile range (IQR) | 3.10545 |
Descriptive statistics
| Standard deviation | 2.132444822 |
|---|---|
| Coefficient of variation (CV) | 0.5583102964 |
| Kurtosis | 0.6284975576 |
| Mean | 3.81946175 |
| Median Absolute Deviation (MAD) | 1.31305 |
| Skewness | 1.040768718 |
| Sum | 1527.7847 |
| Variance | 4.547320918 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 5.7209 | 4 | 1.0% |
| 3.4952 | 4 | 1.0% |
| 5.2873 | 4 | 1.0% |
| 5.4007 | 4 | 1.0% |
| 6.0622 | 3 | 0.8% |
| 3.6519 | 3 | 0.8% |
| 4.8122 | 3 | 0.8% |
| 7.8278 | 3 | 0.8% |
| 6.498 | 3 | 0.8% |
| 6.8147 | 3 | 0.8% |
| Other values (329) | 366 |
| Value | Count | Frequency (%) |
| 1.1296 | 1 | |
| 1.137 | 1 | |
| 1.1691 | 1 | |
| 1.1742 | 1 | |
| 1.1781 | 1 | |
| 1.2024 | 1 | |
| 1.3163 | 1 | |
| 1.3216 | 1 | |
| 1.3325 | 1 | |
| 1.3449 | 1 |
| Value | Count | Frequency (%) |
| 12.1265 | 1 | |
| 10.7103 | 2 | |
| 10.5857 | 2 | |
| 9.2229 | 1 | |
| 9.2203 | 1 | |
| 9.1876 | 1 | |
| 9.0892 | 1 | |
| 8.9067 | 1 | |
| 8.7921 | 2 | |
| 8.6966 | 1 |
| Distinct | 9 |
|---|---|
| Distinct (%) | 2.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.4625 |
| Minimum | 1 |
|---|---|
| Maximum | 24 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 4 |
| median | 5 |
| Q3 | 24 |
| 95-th percentile | 24 |
| Maximum | 24 |
| Range | 23 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 8.687478025 |
|---|---|
| Coefficient of variation (CV) | 0.918095432 |
| Kurtosis | -0.8265621333 |
| Mean | 9.4625 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 1.023992366 |
| Sum | 3785 |
| Variance | 75.47227444 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=9)
| Value | Count | Frequency (%) |
| 24 | 103 | |
| 5 | 90 | |
| 4 | 90 | |
| 3 | 33 | 8.2% |
| 8 | 20 | 5.0% |
| 6 | 17 | 4.2% |
| 2 | 17 | 4.2% |
| 1 | 17 | 4.2% |
| 7 | 13 | 3.2% |
| Value | Count | Frequency (%) |
| 1 | 17 | 4.2% |
| 2 | 17 | 4.2% |
| 3 | 33 | 8.2% |
| 4 | 90 | |
| 5 | 90 | |
| 6 | 17 | 4.2% |
| 7 | 13 | 3.2% |
| 8 | 20 | 5.0% |
| 24 | 103 |
| Value | Count | Frequency (%) |
| 24 | 103 | |
| 8 | 20 | 5.0% |
| 7 | 13 | 3.2% |
| 6 | 17 | 4.2% |
| 5 | 90 | |
| 4 | 90 | |
| 3 | 33 | 8.2% |
| 2 | 17 | 4.2% |
| 1 | 17 | 4.2% |
| Distinct | 63 |
|---|---|
| Distinct (%) | 15.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 403.7975 |
| Minimum | 187 |
|---|---|
| Maximum | 711 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 187 |
|---|---|
| 5-th percentile | 222 |
| Q1 | 277 |
| median | 329 |
| Q3 | 666 |
| 95-th percentile | 666 |
| Maximum | 711 |
| Range | 524 |
| Interquartile range (IQR) | 389 |
Descriptive statistics
| Standard deviation | 169.6568156 |
|---|---|
| Coefficient of variation (CV) | 0.4201532095 |
| Kurtosis | -1.112504552 |
| Mean | 403.7975 |
| Median Absolute Deviation (MAD) | 74 |
| Skewness | 0.7030860974 |
| Sum | 161519 |
| Variance | 28783.43508 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 666 | 103 | |
| 307 | 33 | 8.2% |
| 403 | 23 | 5.8% |
| 277 | 11 | 2.8% |
| 304 | 11 | 2.8% |
| 264 | 11 | 2.8% |
| 437 | 10 | 2.5% |
| 224 | 9 | 2.2% |
| 233 | 9 | 2.2% |
| 398 | 8 | 2.0% |
| Other values (53) | 172 |
| Value | Count | Frequency (%) |
| 187 | 1 | 0.2% |
| 188 | 5 | |
| 193 | 7 | |
| 198 | 1 | 0.2% |
| 216 | 5 | |
| 222 | 7 | |
| 223 | 4 | |
| 224 | 9 | |
| 226 | 1 | 0.2% |
| 233 | 9 |
| Value | Count | Frequency (%) |
| 711 | 4 | 1.0% |
| 666 | 103 | |
| 437 | 10 | 2.5% |
| 432 | 6 | 1.5% |
| 430 | 3 | 0.8% |
| 422 | 1 | 0.2% |
| 411 | 2 | 0.5% |
| 403 | 23 | 5.8% |
| 402 | 2 | 0.5% |
| 398 | 8 | 2.0% |
| Distinct | 44 |
|---|---|
| Distinct (%) | 11.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.459 |
| Minimum | 12.6 |
|---|---|
| Maximum | 22 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 12.6 |
|---|---|
| 5-th percentile | 14.7 |
| Q1 | 17.4 |
| median | 18.95 |
| Q3 | 20.2 |
| 95-th percentile | 21 |
| Maximum | 22 |
| Range | 9.4 |
| Interquartile range (IQR) | 2.8 |
Descriptive statistics
| Standard deviation | 2.148104953 |
|---|---|
| Coefficient of variation (CV) | 0.116371686 |
| Kurtosis | -0.1532549156 |
| Mean | 18.459 |
| Median Absolute Deviation (MAD) | 1.25 |
| Skewness | -0.8363241144 |
| Sum | 7383.6 |
| Variance | 4.614354887 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=44)
| Value | Count | Frequency (%) |
| 20.2 | 111 | |
| 14.7 | 24 | 6.0% |
| 21 | 23 | 5.8% |
| 17.8 | 18 | 4.5% |
| 18.6 | 16 | 4.0% |
| 17.4 | 15 | 3.8% |
| 19.2 | 15 | 3.8% |
| 18.4 | 13 | 3.2% |
| 19.1 | 12 | 3.0% |
| 13 | 11 | 2.8% |
| Other values (34) | 142 |
| Value | Count | Frequency (%) |
| 12.6 | 2 | 0.5% |
| 13 | 11 | |
| 13.6 | 1 | 0.2% |
| 14.4 | 1 | 0.2% |
| 14.7 | 24 | |
| 14.8 | 3 | 0.8% |
| 14.9 | 4 | 1.0% |
| 15.2 | 8 | 2.0% |
| 15.3 | 2 | 0.5% |
| 15.5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 22 | 2 | 0.5% |
| 21.2 | 10 | 2.5% |
| 21 | 23 | 5.8% |
| 20.9 | 7 | 1.8% |
| 20.2 | 111 | |
| 20.1 | 4 | 1.0% |
| 19.7 | 7 | 1.8% |
| 19.6 | 5 | 1.2% |
| 19.2 | 15 | 3.8% |
| 19.1 | 12 | 3.0% |
| Distinct | 286 |
|---|---|
| Distinct (%) | 71.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 359.455375 |
| Minimum | 0.32 |
|---|---|
| Maximum | 396.9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 0.32 |
|---|---|
| 5-th percentile | 97.484 |
| Q1 | 376.115 |
| median | 391.575 |
| Q3 | 396.285 |
| 95-th percentile | 396.9 |
| Maximum | 396.9 |
| Range | 396.58 |
| Interquartile range (IQR) | 20.17 |
Descriptive statistics
| Standard deviation | 86.73290591 |
|---|---|
| Coefficient of variation (CV) | 0.2412897732 |
| Kurtosis | 8.565800309 |
| Mean | 359.455375 |
| Median Absolute Deviation (MAD) | 5.325 |
| Skewness | -3.082984701 |
| Sum | 143782.15 |
| Variance | 7522.596967 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 396.9 | 98 | 24.5% |
| 395.56 | 2 | 0.5% |
| 396.21 | 2 | 0.5% |
| 395.63 | 2 | 0.5% |
| 396.06 | 2 | 0.5% |
| 389.71 | 2 | 0.5% |
| 395.24 | 2 | 0.5% |
| 394.72 | 2 | 0.5% |
| 377.07 | 2 | 0.5% |
| 395.11 | 2 | 0.5% |
| Other values (276) | 284 |
| Value | Count | Frequency (%) |
| 0.32 | 1 | |
| 2.52 | 1 | |
| 2.6 | 1 | |
| 3.5 | 1 | |
| 7.68 | 1 | |
| 9.32 | 1 | |
| 16.45 | 1 | |
| 18.82 | 1 | |
| 21.57 | 1 | |
| 22.01 | 1 |
| Value | Count | Frequency (%) |
| 396.9 | 98 | |
| 396.33 | 1 | 0.2% |
| 396.3 | 1 | 0.2% |
| 396.28 | 1 | 0.2% |
| 396.24 | 1 | 0.2% |
| 396.23 | 1 | 0.2% |
| 396.21 | 2 | 0.5% |
| 396.14 | 1 | 0.2% |
| 396.06 | 2 | 0.5% |
| 395.93 | 1 | 0.2% |
| Distinct | 365 |
|---|---|
| Distinct (%) | 91.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.668525 |
| Minimum | 1.92 |
|---|---|
| Maximum | 37.97 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 1.92 |
|---|---|
| 5-th percentile | 3.9485 |
| Q1 | 6.99 |
| median | 10.875 |
| Q3 | 16.91 |
| 95-th percentile | 27.266 |
| Maximum | 37.97 |
| Range | 36.05 |
| Interquartile range (IQR) | 9.92 |
Descriptive statistics
| Standard deviation | 7.207046752 |
|---|---|
| Coefficient of variation (CV) | 0.5688939124 |
| Kurtosis | 0.4587133226 |
| Mean | 12.668525 |
| Median Absolute Deviation (MAD) | 4.515 |
| Skewness | 0.9507886989 |
| Sum | 5067.41 |
| Variance | 51.94152288 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6.36 | 3 | 0.8% |
| 8.05 | 3 | 0.8% |
| 9.5 | 2 | 0.5% |
| 18.06 | 2 | 0.5% |
| 7.39 | 2 | 0.5% |
| 5.68 | 2 | 0.5% |
| 13.15 | 2 | 0.5% |
| 4.56 | 2 | 0.5% |
| 6.72 | 2 | 0.5% |
| 17.27 | 2 | 0.5% |
| Other values (355) | 378 |
| Value | Count | Frequency (%) |
| 1.92 | 1 | |
| 2.47 | 1 | |
| 2.88 | 1 | |
| 2.94 | 1 | |
| 2.96 | 1 | |
| 2.97 | 1 | |
| 3.01 | 1 | |
| 3.11 | 1 | |
| 3.13 | 1 | |
| 3.16 | 1 |
| Value | Count | Frequency (%) |
| 37.97 | 1 | |
| 34.77 | 1 | |
| 34.41 | 1 | |
| 34.37 | 1 | |
| 34.02 | 1 | |
| 31.99 | 1 | |
| 30.81 | 2 | |
| 30.63 | 1 | |
| 30.62 | 1 | |
| 30.59 | 1 |
| Distinct | 205 |
|---|---|
| Distinct (%) | 51.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.47575 |
| Minimum | 5 |
|---|---|
| Maximum | 50 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 10.39 |
| Q1 | 17.1 |
| median | 21 |
| Q3 | 25 |
| 95-th percentile | 43.81 |
| Maximum | 50 |
| Range | 45 |
| Interquartile range (IQR) | 7.9 |
Descriptive statistics
| Standard deviation | 9.218611279 |
|---|---|
| Coefficient of variation (CV) | 0.4101581162 |
| Kurtosis | 1.562946674 |
| Mean | 22.47575 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 1.132726221 |
| Sum | 8990.3 |
| Variance | 84.98279392 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 50 | 13 | 3.2% |
| 22 | 6 | 1.5% |
| 25 | 5 | 1.2% |
| 15.6 | 5 | 1.2% |
| 21.7 | 5 | 1.2% |
| 23.1 | 5 | 1.2% |
| 21.2 | 5 | 1.2% |
| 23.9 | 5 | 1.2% |
| 19.4 | 5 | 1.2% |
| 17.8 | 5 | 1.2% |
| Other values (195) | 341 |
| Value | Count | Frequency (%) |
| 5 | 2 | |
| 5.6 | 1 | |
| 6.3 | 1 | |
| 7 | 1 | |
| 7.2 | 2 | |
| 7.4 | 1 | |
| 7.5 | 1 | |
| 8.3 | 2 | |
| 8.4 | 1 | |
| 8.5 | 2 |
| Value | Count | Frequency (%) |
| 50 | 13 | |
| 48.8 | 1 | 0.2% |
| 48.3 | 1 | 0.2% |
| 46.7 | 1 | 0.2% |
| 46 | 1 | 0.2% |
| 45.4 | 1 | 0.2% |
| 44.8 | 1 | 0.2% |
| 44 | 1 | 0.2% |
| 43.8 | 1 | 0.2% |
| 43.1 | 1 | 0.2% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | MEDV | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.95577 | 0.0 | 8.14 | 0 | 0.538 | 6.047 | 88.8 | 4.4534 | 4 | 307.0 | 21.0 | 306.38 | 17.28 | 14.8 |
| 1 | 0.02875 | 28.0 | 15.04 | 0 | 0.464 | 6.211 | 28.9 | 3.6659 | 4 | 270.0 | 18.2 | 396.33 | 6.21 | 25.0 |
| 2 | 1.22358 | 0.0 | 19.58 | 0 | 0.605 | 6.943 | 97.4 | 1.8773 | 5 | 403.0 | 14.7 | 363.43 | 4.59 | 41.3 |
| 3 | 5.66637 | 0.0 | 18.10 | 0 | 0.740 | 6.219 | 100.0 | 2.0048 | 24 | 666.0 | 20.2 | 395.69 | 16.59 | 18.4 |
| 4 | 0.04544 | 0.0 | 3.24 | 0 | 0.460 | 6.144 | 32.2 | 5.8736 | 4 | 430.0 | 16.9 | 368.57 | 9.09 | 19.8 |
| 5 | 0.10659 | 80.0 | 1.91 | 0 | 0.413 | 5.936 | 19.5 | 10.5857 | 4 | 334.0 | 22.0 | 376.04 | 5.57 | 20.6 |
| 6 | 51.13580 | 0.0 | 18.10 | 0 | 0.597 | 5.757 | 100.0 | 1.4130 | 24 | 666.0 | 20.2 | 2.60 | 10.11 | 15.0 |
| 7 | 3.32105 | 0.0 | 19.58 | 1 | 0.871 | 5.403 | 100.0 | 1.3216 | 5 | 403.0 | 14.7 | 396.90 | 26.82 | 13.4 |
| 8 | 1.05393 | 0.0 | 8.14 | 0 | 0.538 | 5.935 | 29.3 | 4.4986 | 4 | 307.0 | 21.0 | 386.85 | 6.58 | 23.1 |
| 9 | 0.24522 | 0.0 | 9.90 | 0 | 0.544 | 5.782 | 71.7 | 4.0317 | 4 | 304.0 | 18.4 | 396.90 | 15.94 | 19.8 |
Last rows
| CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | MEDV | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 390 | 9.96654 | 0.0 | 18.10 | 0 | 0.740 | 6.485 | 100.0 | 1.9784 | 24 | 666.0 | 20.2 | 386.73 | 18.85 | 15.4 |
| 391 | 14.05070 | 0.0 | 18.10 | 0 | 0.597 | 6.657 | 100.0 | 1.5275 | 24 | 666.0 | 20.2 | 35.05 | 21.22 | 17.2 |
| 392 | 0.11432 | 0.0 | 8.56 | 0 | 0.520 | 6.781 | 71.3 | 2.8561 | 5 | 384.0 | 20.9 | 395.58 | 7.67 | 26.5 |
| 393 | 0.59005 | 0.0 | 21.89 | 0 | 0.624 | 6.372 | 97.9 | 2.3274 | 4 | 437.0 | 21.2 | 385.76 | 11.12 | 23.0 |
| 394 | 0.06860 | 0.0 | 2.89 | 0 | 0.445 | 7.416 | 62.5 | 3.4952 | 2 | 276.0 | 18.0 | 396.90 | 6.19 | 33.2 |
| 395 | 0.03615 | 80.0 | 4.95 | 0 | 0.411 | 6.630 | 23.4 | 5.1167 | 4 | 245.0 | 19.2 | 396.90 | 4.70 | 27.9 |
| 396 | 0.17505 | 0.0 | 5.96 | 0 | 0.499 | 5.966 | 30.2 | 3.8473 | 5 | 279.0 | 19.2 | 393.43 | 10.13 | 24.7 |
| 397 | 6.65492 | 0.0 | 18.10 | 0 | 0.713 | 6.317 | 83.0 | 2.7344 | 24 | 666.0 | 20.2 | 396.90 | 13.99 | 19.5 |
| 398 | 0.13117 | 0.0 | 8.56 | 0 | 0.520 | 6.127 | 85.2 | 2.1224 | 5 | 384.0 | 20.9 | 387.69 | 14.09 | 20.4 |
| 399 | 0.06466 | 70.0 | 2.24 | 0 | 0.400 | 6.345 | 20.1 | 7.8278 | 5 | 358.0 | 14.8 | 368.24 | 4.97 | 22.5 |